AITopics | outlier fraction

Collaborating Authors

outlier fraction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PICZL: Image-based Photometric Redshifts for AGN

Roster, William, Salvato, Mara, Krippendorf, Sven, Saxena, Aman, Shirley, Raphael, Buchner, Johannes, Wolf, Julien, Dwelly, Tom, Bauer, Franz E., Aird, James, Ricci, Claudio, Assef, Roberto J., Anderson, Scott F., Liu, Xin, Merloni, Andrea, Weller, Jochen, Nandra, Kirpal

arXiv.org Machine LearningNov-13-2024

Computing photo-z for AGN is challenging, primarily due to the interplay of relative emissions associated with the SMBH and its host galaxy. SED fitting methods, effective in pencil-beam surveys, face limitations in all-sky surveys with fewer bands available, lacking the ability to capture the AGN contribution to the SED accurately. This limitation affects the many 10s of millions of AGN clearly singled out and identified by SRG/eROSITA. Our goal is to significantly enhance photometric redshift performance for AGN in all-sky surveys while avoiding the need to merge multiple data sets. Instead, we employ readily available data products from the 10th Data Release of the Imaging Legacy Survey for DESI, covering > 20,000 deg$^{2}$ with deep images and catalog-based photometry in the grizW1-W4 bands. We introduce PICZL, a machine-learning algorithm leveraging an ensemble of CNNs. Utilizing a cross-channel approach, the algorithm integrates distinct SED features from images with those obtained from catalog-level data. Full probability distributions are achieved via the integration of Gaussian mixture models. On a validation sample of 8098 AGN, PICZL achieves a variance $\sigma_{\textrm{NMAD}}$ of 4.5% with an outlier fraction $\eta$ of 5.6%, outperforming previous attempts to compute accurate photo-z for AGN using ML. We highlight that the model's performance depends on many variables, predominantly the depth of the data. A thorough evaluation of these dependencies is presented in the paper. Our streamlined methodology maintains consistent performance across the entire survey area when accounting for differing data quality. The same approach can be adopted for future deep photometric surveys such as LSST and Euclid, showcasing its potential for wide-scale realisation. With this paper, we release updated photo-z (including errors) for the XMM-SERVS W-CDF-S, ELAIS-S1 and LSS fields.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Machine Learning

doi: 10.1051/0004-6361/202452361

2411.07305

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Arizona (0.04)
(29 more...)

Genre:

Overview (0.92)
Research Report > New Finding (0.46)

Industry:

Energy (1.00)
Government (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploring the Impact of Outlier Variability on Anomaly Detection Evaluation Metrics

Ok, Minjae, Klüttermann, Simon, Müller, Emmanuel

arXiv.org Artificial IntelligenceSep-24-2024

Anomaly detection is a dynamic field, in which the evaluation of models plays a critical role in understanding their effectiveness. The selection and interpretation of the evaluation metrics are pivotal, particularly in scenarios with varying amounts of anomalies. This study focuses on examining the behaviors of three widely used anomaly detection metrics under different conditions: F1 score, Receiver Operating Characteristic Area Under Curve (ROC AUC), and Precision-Recall Curve Area Under Curve (AUCPR). Our study critically analyzes the extent to which these metrics provide reliable and distinct insights into model performance, especially considering varying levels of outlier fractions and contamination thresholds in datasets. Through a comprehensive experimental setup involving widely recognized algorithms for anomaly detection, we present findings that challenge the conventional understanding of these metrics and reveal nuanced behaviors under varying conditions. We demonstrated that while the F1 score and AUCPR are sensitive to outlier fractions, the ROC AUC maintains consistency and is unaffected by such variability. Additionally, under conditions of a fixed outlier fraction in the test set, we observe an alignment between ROC AUC and AUCPR, indicating that the choice between these two metrics may be less critical in such scenarios. The results of our study contribute to a more refined understanding of metric selection and interpretation in anomaly detection, offering valuable insights for both researchers and practitioners in the field.

anomaly detection, outlier fraction, roc auc, (11 more...)

arXiv.org Artificial Intelligence

2409.15986

Country:

Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

About Test-time training for outlier detection

Klüttermann, Simon, Müller, Emmanuel

arXiv.org Artificial IntelligenceApr-4-2024

In this paper, we introduce DOUST, our method applying test-time training for outlier detection, significantly improving the detection performance. After thoroughly evaluating our algorithm on common benchmark datasets, we discuss a common problem and show that it disappears with a large enough test set. Thus, we conclude that under reasonable conditions, our algorithm can reach almost supervised performance even when no labeled outliers are given.

algorithm, anomaly, dataset, (16 more...)

arXiv.org Artificial Intelligence

2404.03495

Country:

Europe > Germany (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (0.67)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Provable Dynamic Robust PCA or Robust Subspace Tracking

Narayanamurthy, Praneeth, Vaswani, Namrata

arXiv.org Machine LearningOct-22-2017

Dynamic robust PCA refers to the dynamic (time-varying) extension of the robust PCA (RPCA) problem. It assumes that the true (uncorrupted) data lies in a low-dimensional subspace that can change with time, albeit slowly. The goal is to track this changing subspace over time in the presence of sparse outliers. This work provides the first guarantee for dynamic RPCA that holds under weakened versions of standard RPCA assumptions and a few other simple assumptions. We analyze a novel algorithm based on the recently introduced Recursive Projected Compressive Sensing (ReProCS) framework. Our result is significant because (i) it removes the strong assumptions needed by the two previous complete guarantees for ReProCS-based algorithms; (ii) it shows that, it is possible to achieve significantly improved outlier tolerance by exploiting slow subspace change and a lower bound on most outlier magnitudes; and (iii) it proves that the proposed algorithm is online (after initialization), fast, and, has near-optimal storage complexity.

artificial intelligence, chd, machine learning, (16 more...)

arXiv.org Machine Learning

1705.08948

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.92)

Add feedback

Outlier Robust Online Learning

Feng, Jiashi, Xu, Huan, Mannor, Shie

arXiv.org Machine LearningJan-1-2017

We consider the problem of learning from noisy data in practical settings where the size of data is too large to store on a single machine. More challenging, the data coming from the wild may contain malicious outliers. To address the scalability and robustness issues, we present an online robust learning (ORL) approach. ORL is simple to implement and has provable robustness guarantee -- in stark contrast to existing online learning approaches that are generally fragile to outliers. We specialize the ORL approach for two concrete cases: online robust principal component analysis and online linear regression. We demonstrate the efficiency and robustness advantages of ORL through comprehensive simulations and predicting image tags on a large-scale data set. We also discuss extension of the ORL to distributed learning and provide experimental evaluations.

artificial intelligence, geometric median, machine learning, (15 more...)

arXiv.org Machine Learning

1701.00251

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Distributed Robust Learning

Feng, Jiashi, Xu, Huan, Mannor, Shie

arXiv.org Machine LearningFeb-7-2015

We propose a framework for distributed robust statistical learning on {\em big contaminated data}. The Distributed Robust Learning (DRL) framework can reduce the computational time of traditional robust learning methods by several orders of magnitude. We analyze the robustness property of DRL, showing that DRL not only preserves the robustness of the base robust learning method, but also tolerates contaminations on a constant fraction of results from computing nodes (node failures). More precisely, even in presence of the most adversarial outlier distribution over computing nodes, DRL still achieves a breakdown point of at least $ \lambda^*/2 $, where $ \lambda^* $ is the break down point of corresponding centralized algorithm. This is in stark contrast with naive division-and-averaging implementation, which may reduce the breakdown point by a factor of $ k $ when $ k $ computing nodes are used. We then specialize the DRL framework for two concrete cases: distributed robust principal component analysis and distributed robust regression. We demonstrate the efficiency and the robustness advantages of DRL through comprehensive simulations and predicting image tags on a large-scale image set.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1409.5937

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback